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DETAILED ACTION 

Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

2. Claims 1, 3 - 7, 9, 1 1 - 15, 17, 19, and 20 are rejected under 35 U.S.C. 102(b) 
as being anticipated by Johnson et al., (US PAP 2003/0167172). 

As per claims 1, 9, and 17, Johnson et al., teach a method for determining when 
a user has ceased inputting data, the method comprising the steps of: 

receiving a plurality of user inputs (multiple inputs entered; paragraph 21, lines 
13-15; paragraph 64, lines 1- 3); 

determining a content of the input for each of the user inputs("uttering direction 
from here to there, and the agent program fills the starting location field with here and 
the destination field with there") , and determining a mode of input for each of the user 
inputs (filling the fields by the voice browser implies determining a voice mode of the 
input; paragraph 35, lines 6 - 25 ) 

accessing a plurality of templates from a database (uttering here to there and the 
voice browser fills the starting location field with here and the destination location field 
with there, while the graphical browser fills the starting and the destination location 
fields with geographical locations by the mode of a clicking point on the map, which 
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implies accessing a plurality of templates from a database ; paragraph 57, lines 5-1 1 , 
paragraph 35, lines 6 -20); and 

determining that the user has ceased inputting data (detecting terminating 
sessions) if the user's inputs fill any template from the database (filling the starting 
location field and the destination location field with "here and there" implies filling a 
template; paragraph 25, lines 12-15, paragraph 35, lines 6- 25). 

As per claims 3, 1 1 , Johnson et al., further disclose receiving the input from the 
user comprises the step of receiving a multi-modal input from the user (voice browser 
and graphical browser; paragraph 35, lines 6 - 20). 

As per claims 4, and 12, Johnson et al., further disclose that receiving a multi- 
modal input from the group consisting of a text input, a speech input, and a handwritten 
input (enter information in text mode and voice mode, handwriting recognition; 
paragraph 34, lines 1-3, paragraph 22, line 27). 

As per claims 5, 13, and 19, Johnson et al., further disclose accessing a plurality 
of semantic templates ("determine the word "here" corresponds to geographical location 
and "there" corresponds to the geographical location" implies accessing a plurality of 
semantic templates, since the multimodal fusion engine determines corresponding 
inputs between different modalities; paragraph 35, lines 6 - 20). 
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As per claims 6, and 14, Johnson et al., further disclose accessing a plurality of 
templates (uttering here to there and the voice browser fills the starting location field 
with here and the destination location field with there, while the graphical browser fills 
the starting and the destination location fields with geographical locations by clicking 
point on the map, implies accessing a plurality of templates) comprising combinations of 
possible user inputs and their possible mode input (multiple input entered through 
different modalities; paragraph 37, lines 3-6, paragraph 64, lines 1 - 3; paragraph 35, 
lines 6 -20). 

As per claims 7, 15, and 20, Johnson et al., further disclose dynamically updating 
templates (uttering here to there and the voice browser fills the starting location field 
with here and the destination location field with there, while the graphical browser fills 
the starting and the destination location fields with geographical locations by clicking 
point on the map, implies having a plurality of templates) from database [ the 
synchronization coordinator (used to forward the received information to the multimodal 
fusion engine), refrains from obtaining modality-specific instructions for those modalities 
identified to be mutedymplies dynamically updating templates from database by muting 
specific modes; paragraph 25, paragraph 30, lines 10-12; paragraph 35, lines 6 -20]. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 
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(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be. patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 2, 10, and 18 are rejected under 35 U.S.C. 103(a) as being unpatentable 

over Johnson et al., (US Patent 6,807,529). 

As per claims 2, 10, and 18, Johnson et al., do not disclose that determining that 

the user has ceased inputting data if the predetermined amount time has passed. 

However, since Johnson et al., disclose that if a user is expected to enter both voice 

and text information concurrently but the multimodal fusion engine does not receive the 

information for fusing within a period of time, it will assume that an error has occurred 

(paragraph 48, lines 11-15). One having ordinary skill in the art would have found it 

obvious to determine that the user has ceased inputting data if the predetermined 

amount time has passed within Johnson et al., method, because that would let the 

multimodal fusion engine allow more time to elapse for voice information to be returned 

than for text information (paragraph 48, lines 15-18). 

6. Claims 8, and 16 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Johnson et al., (US Patent 6,807,529) as applied to claims 7, and 15, and in view 
of applicant's admitted prior art. 

As per claims 8, and 16, Johnson et al., further disclose dynamically updating 
templates based on a characteristic taken from the group consisting of available modes 
of input (voice browser and graphical browser), and expected response from the user 
(expected information to be input), and the history and the current status of the task that 



Application/Control Number: 1 0/767,422 Page 6 

Art Unit: 2626 

the user is working on (providing information about which field has been filed by the 
user during a previous concurrent multimodal session state, and current state; 
paragraph 34, lines 20 - 23; paragraph 35, lines 6-11; paragraph 57, lines 5-12). 
However, Johnson et al., do not specifically teach a list of discourse obligations that 
constrain what the user can input in the next dialog turn. 

In the same field of endeavor, applicant's admitted prior art teaches that checking 
discourse obligations is known and has been used in state-of-the-art dialog systems. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to consider discourse obligations list as taught by applicant's 
admitted prior art in Johnson et al., because that would improve the system, by 
predicting users' inputs. 

Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Azvine et al., (US Patent 6,779,060) teach a multimodal user interface, wherein 
user inputs may be made in various different ways. 

Yamagushi et al., (US Patent 6,345,111) teach a multi-modal interface apparatus 
and method, wherein the user input at least one medium of sound information, 
character information, image information and operation information through a media 
input selection. 
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Prevost et al., (US 6,570,555) teach a method and apparatus for embodied 
conversational characters with multimodal input/output in an interface device. 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leonard Saint-Cyr whose telephone number is (571) 

272- 4247. The examiner can normally be reached on Mon- Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is (571)- 

273- 8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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